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COMPUTER-AIDED PROBABILITY BASE 
CALLING FOR ARRAYS OF NUCLEIC ACID 
PROBES ON CHIPS 

GOVERNMENT RIGHTS NOTICE 

Portions of the material in this specification arose under 
ic cooperative agreement 70NANB5H1031 between 
iflymettU. Inc. and the Department of Commerce through 
>e National Institute of Standards aad Technology. 

COPYRIGHT NOTICE 

A portion of the disclosure of this patent document 
oeiains material which is subject to copyright protection, 
hs copyright owner has no objection to the xeroxographic 
: production by anyone of the patent document or the patent 
isciosure in exactly the form it appears in the Patent and 
rademaik Office patent file or records, but otherwise 
•ixrves all copyright rights whatsoever. 

SOFTWARE APPENDIX 
A Software Appendix comprising twenty one (21) sheets 
i included herewith. 

BACKGROUND OF THE INVENTION 

The present invention relates to the field of computer 
ystems. More specifically, the present invention relates to 
omputer systems for evaluating and comparing biological 



Devices and computer systems for forming and using 
irrays of materials on a substrate are known. For example, 
CT application WO92/10588, incorporated herein by ref- 
rcoce for all purposes, describes techniques for sequencing 
x sequence ch eckin g nucleic acids and other materials. 
Vrrays for performing these operations may be formed in 
irrays according to the methods of, for example, the pio- 
leering techniques disclosed in VS. Pat No. 5.143.854 and 
J.S. patent application Ser. No. 08/249,188. both Incorpo- 
■ated herein by reference for all purposes. 

According to one aspect of the techniques described 
herein, an array of nucleic acid probes is fabricated at 
chip or substrate. A fluorescent!/ 



(up to 98 .5%). At the same time, confidence information 
may be provided that indicates the livelihood that the base 
has been called correctly. The methods of the present 
Invention are robust and uniformly optimal regardless of the 
5 experimental conditions. 

According to one aspect of the invention, a computer 
system is used to identify an unknown base in a sample 
nucleic acid sequence by the steps of: inputting a plurality of 
hybridization probe intensities, each of the probe intensities 
corresponding to a nucleic acid probe; for each of the 
plurality of probe intensities, determining a probability that 
the corresponding nucleic add probe best hybridizes with 
the sample nucleic acid sequence; and calling the unknown 
base according to the nucleic acid probe with the highest 
associated probability. 
13 According to another aspect of (he invention, an unknown 
base in a sample nucleic add sequence is called by a base 
call with the highest probability of correctly calling the 
unknown base. The unknown base in the sample nuddc add 
sequence is identified by the steps of: inputting multiple base 
30 calls for the unknown base, each of the base calls having an 
as sodated probability which represents a confidence that the 
unknown base is called correctly; selecting a base call that 
has a highest associated probability; and calling the 
u nkn own base according to the selected base call. The 
13 multiple base calls are typically produced from multiple 
experiments. The multiple experiments may be performed 
on the same chip utilizing different parameters (e.g., nucleic 
add probe length). 
According to yet another aspect of the invention, an 
x u nkno wn base in a sample nucleic add sequence Is called 
according to multiple base calls that collectively have the 
highest probabfliry of correctly calling the unknown base. 
The unknown base in the sample nucleic add sequence is 
identified by the steps of: inputting multiple probabilities for 
each possible base for the unknown base, each of the 
probabilities representing a probability that the unknown 
base is an as soda led base; producing a product of probabili- 
ties for each possible base, each product being associated 
with a possible base; and calling the unknown base accord- 
ing to a base associated with a highest product The multiple 
40 base calls are typically produced from multiple experiments. 
The multiple experiments may be performed on the same 
chip utilizing different parameters (e.g.. nudelc add probe 
length). 

According to another aspect of the Invention, both strands 



cnown locations on 

abeled nucleic add is then brought into contact with the J^ZSto^i^wtamtoicmv* 

«US. ;r - sequent DNA „ ^^^^^Sibi 
xRNA. Such systems have been used to form, for example^ . ^ caU for the unknown base, the 

xrrays of DNA that may be used to «udy and detert Snd bate^S detained from a second nudeic arid 
mutations relevant tocystic fibrosis, the P53 gene (relevant ^ aTis^lc^ary to a portion of the sample 
to certain cancers). HIV. and other genenc charactensdes. Ec add sequSKg the unknown base; sdecting 

Innovative computer-aided techniques for base calling are J3 one Qf ^ ^ a nucUic ^ Ail hai a base 
disclosed in U.S. patent application Sex No. ,08/327.325. g{ M intetrogfldon ^0,, which has a high probability 



which is incorporated by reference for all purposes. 
However, improved computer systems and methods are still 
Deeded to evaluate, analyze, and process the vast amount of 
information now used and made available by these pioneer- tt 
ing technologies. 

SUMMARY OF THE INVENTION 
An improved computer-aided system for calling unknown 
bases in sample nucleic add sequeoccs from multiple 65 
nucleic add probe intensities is disclosed The present 
Invention is able to call bases with extrerady high accuracy 



produdng correct base calls; and calling the unknown 
according to the selected one of the first 
acid probes. 

A further understanding of the nature and advantages of 
the Inventions herein may be realized by reference to the 
remaining portions of the specification and the attached 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 illustrates an example of a computer system used 
to execute the software of die present invention; 



